An Introduction to SQL Relationships

As I've mentioned, SQL databases allow for you to build relationships between the tables of your database. This is handled through Foreign Keys, which is a new kind of Column we haven't yet seen. Effectively, a Foreign Key is a piece of information (usually the integer primary key) that uniquely identifies the object with which the current object has a relationship. There are several key permutations of relationships that you can use:

  • One to one
  • One to many
  • Many to many </table> The relationship types are pretty self-explanatory. A one to one relationship links a specific object, let's say a Star, to another specific object, let's say a SkyCoordinates object. Each Star has exactly one SkyCoordinates object, and each SkyCoordinates object has exactly one Star.
  • A one to many relationship links a specific object, let's say Photometry, to another, again a Star, just like in a one to one relationship; however, the linked object is allowed to be related to more than one of the original object. In this case, a Star can have more than one Photometry object, but each Photometry object is associated with exactly one Star.

    Finally, a many to many relationship links two types of objects, let's say Tag and Star. The Star object can have multiple Tags. But you likely will want to treat these like categories, so each Tag should be able to associate with multiple Stars as well.

    Perhaps the obvious question is, 'Why aren't we starting with one to one? It's probably easier.' And you'd be almost right. The issue with one to one relationships is that they normally aren't that useful, and so they're usually implemented as a slight alteration of a one to many relationship in which the 'many' part is restricted to be 'one.' Note in the above example that I used a Star object and a SkyCoordinates object; whereas, in the previous notebook, I suggested you include the coordinates in the Star object. In general, with any one to one relationship, the contents of the related object can simply be added as additional columns in the original, making them not terribly useful.

    One to Many Relationships in SQLAlchemy

    Before we get started building relationships, we'll need some objects with which to build relationships. Copy over the Star object from the previous notebook below. Make sure to recreate the engine and the Base class from the previous notebook.

    
    
    In [1]:
    from sqlalchemy import create_engine, Column, Integer, String, Float
    from sqlalchemy.ext.declarative import declarative_base
    
    engine = create_engine ('sqlite:///:memory:', echo = False)
    Base = declarative_base()
    
    
    
    In [2]:
    
    

    Now let's generate that Photometry object I mentioned before. Give it the following columns: a string column named filter; a float column named mag; a string column named comments. Don't add this to the engine just yet. We'll still need to create the relationship.

    
    
    In [3]:
    
    

    To create the relationship, we need to add a ForeignKey column to the object. Normally you would just add the additional column information into the class definition directly, but you can also modify the class after the fact.

    
    
    In [4]:
    from sqlalchemy import ForeignKey
    from sqlalchemy.orm import relationship, backref
    
    Photometry.star_id = Column (Integer, ForeignKey("stars.id")) # This is the actual ForeignKey column
    Photometry.star = relationship ("Star", backref = backref ("photometry_set")) # This 
        # will let you type photometry.star or star.photometry_set and get the actual related objects.
        # I chose "photometry_set", but you can use whatever valid Python variable name you like
    

    Now, you can add the class to the engine, as before, and don't forget to create the Session class.

    
    
    In [5]:
    
    

    Assuming you've done everything correctly, you've now set up a one to many relationship through SQLAlchemy. Rather than continuing with our toy stars, let's load some real data. I've collected a bunch of stars for you from the SDSS DR7 in M13.

    
    
    In [6]:
    import numpy as np
    
    session = Session ()
    
    data = np.genfromtxt ("sdss-dr7-m13.csv", names = True, delimiter = ',', dtype = None)
    

    Let's look at one of these stars:

    
    
    In [7]:
    filters = ('u', 'g', 'r', 'i', 'z')
    
    datum = data [0]
    print ("ID", datum ["objid"])
    print ("RA", datum ["ra"])
    print ("DEC", datum ["dec"])
    print ()
    for filter in filters:
        print (filter, datum [filter])
    
    
    
    
    ID 587733609092088934
    RA 250.01783886
    DEC 36.45578947
    
    u 25.488085
    g 23.408033
    r 22.248293
    i 22.402548
    z 21.924389
    

    Now, make a database entry for this star, called star. Also make five Photometry objects called uPhot, gPhot, rPhot, iPhot, and zPhot for that star.

    
    
    In [9]:
    
    

    Now, we can declare the relationship between our objects. Thanks to that work we did earlier, this is a fairly trivial task.

    
    
    In [10]:
    uPhot.star = star
    gPhot.star = star
    rPhot.star = star
    iPhot.star = star
    zPhot.star = star
    

    Don't forget to add star to the session and commit the session to the database. Note: any objects related to an object added to the session are automatically added, so you don't need to add all the photometry objects to the database manually. This is due to a process called cascading, which can be extremely powerful, but which I won't address much here. You should also check that the photometry objects were indeed added by querying the database.

    
    
    In [11]:
    session.add (star)
    session.commit ()
    
    session.query (Photometry).all ()
    
    
    
    
    Out[11]:
    [<__main__.Photometry at 0x113242fd0>,
     <__main__.Photometry at 0x113242f98>,
     <__main__.Photometry at 0x11326e9b0>,
     <__main__.Photometry at 0x11326ea20>,
     <__main__.Photometry at 0x11326ea90>]

    Well done. Now do it for the rest of the objects in the file. Warning: Make sure not to readd the star you just added above. Remember, we required the object id to be unique (and for good reason). If you accidentally readd it, you'll get an integrity error. Roll back and try again.

    
    
    In [12]:
    
    

    Nicely done! You've set up a relational database with stars and photometry from SDSS. A comment about style: we decided to make the photometry its own object rather than folding it into the Star object. This is for a couple reasons: if we want to track the photometry over time, we'll need to have multiple values for each filter. This brings up an interesting point: you can't put an array into a database column. If you want to have multiple values of a particular thing, you'll need to set up a new table and add a one to many relationship from there. In addition, we can add an arbitrary number of filters to each object. If we had declared the columns within the star class, we would be restricted to those filters.

    On the other hand, this method is less space efficient than including the columns directly in the Star object, so there is a trade off. If you know your filters ahead of time and know there will only be one value for each, use the space efficient method.

    Querying with Relationships: Joining Queries

    Now that you've got a database with relationships, you'll likely want to know how to query them. This is done through a join command. Effectively, this lets you query by the properties of both the Star and Photometry objects simultaneously. The syntax is as follows:

    
    
    In [13]:
    query = session.query (Star).join (Photometry)
    

    This is a query for a set of Star objects, but if you filter by any property of the Photometry object, only stars with a photometry object that matches ALL your filters will be returned. Try to count all the stars with g magnitude less than 14.5. Feel free to use the query object you created above to avoid having to type out the whole query and join syntax.

    
    
    In [20]:
    
    

    In addition to the normal boolean operators that can be used in filters, there are several additional operations that can assist with narrowing queries. Say you want to return all stars that have an observation in g (which in this case is all of them. You could do it as follows.

    
    
    In [15]:
    session.query (Star).filter (Star.photometry_set.any (filter = "g")).count ()
    
    
    
    
    Out[15]:
    1478

    You could also get all the photometry for a specific object using has:

    
    
    In [17]:
    phots = session.query (Photometry).filter (Photometry.star.has (name = "587733609092088934")).all ()
    

    These may not be exceptionally useful in our pretty simple database, but they're good to keep in mind as the database gets more complicated. I'll show you how to use the contains method in the many to many notebook.

    There is a lot you can do with these joining queries, and I've only covered a few topics. To see more detail, check out the SQLAlchemy ORM tutorial. If you're ready to move on to learning many to many relationships, go to the many to many notebook. If you'd rather see some practical applications, go to the practical examples. If you want to learn more about one to many relationships, keep reading.

    
    
    In [18]:
    
    

    Cascading

    Cascading is an exceptionally useful tool in relational databases. Effectively, this means that when you do something to an object, like create it or delete it, do something to all the objects related to it. Let's try deleting one of the stars in the database:

    
    
    In [19]:
    session.delete (star)
    

    All right, that star has been deleted, but what happened to the Photometry objects that were connected to it? I'll spoil the ending for you: they're still in the database. We can see this by querying the database for Photometry objects that have no star object, i.e. orphaned objects.

    
    
    In [20]:
    session.query (Photometry).filter (Photometry.star == None).count ()
    
    
    
    
    Out[20]:
    5

    The easiest way to fix this is to change the way the relationship is set up in the classes. We can add the cascade argument in two places in the relationship. If placed directly in the relationship command, whenever the current object is deleted (the Photometry), the related object (the Star) will be. This is not the behavior we desire. We can also place the cascase in the backref, in which cause the reverse will happen. The cascade argument takes a comma-separated list of commands that indicate the behavior of the children objects when the parent changes. The delete-orphan option will make sure that when a Star is deleted, all its children will be, too.

    Try copying this Photometry definition up to your Photometry definition and restarting the notebook. See whether those orphaned photometry objects still exist after you delete star.

    
    
    In [2]:
    from sqlalchemy import ForeignKey
    from sqlalchemy.orm import relationship, backref
    
    class Photometry (Base):    
        __tablename__ = 'photometry'
        
        id = Column (Integer, primary_key = True)
    
        filter = Column (String)
        mag = Column (Float)
        date = Column (Float)
        comments = Column (String)
        
        star_id = Column (Integer, ForeignKey("stars.id"))
        star = relationship ("Star", backref = backref ("photometry_set", cascade="all, delete-orphan"))
    
    
    
    In [ ]: